git-annex for computer
scientists
A A
PN ad
O
Joey Hess
Distribits 2025
gi
It
-ą
n
n
e
X
eco
Sy
st
e
m
>
a
(dd
=
å
git-annex ecosystem
joey@darkstar:~/src/git-annex/doc>ls git-annex*.mdwn | wc -1
66
joey@darkstar:-/src/git-annex/doc>cat git-annex*.mdwn | wc -1
:-/src/git/Documentation>ls git*.adoc [wc -l
:-/src/git/Documentation>cat git*.adoc | wc -1
https://eagain.net/articles/git-for-computer-scientists/
Git for Computer Scientists
Abstract
Quick introduction to git internals for people who are not scared by words like Directed Acyclic Graph.
Storage
In simplified form, git object storage is "just" a DAG of objects, with a handful of different types of objects. They are all stored compressed and identified by
an SHA-1 hash (that, incidentally, isn't the SHA-1 of the contents of the file they represent, but of their representation in git).
blob: The simplest object, just a bunch of bytes. This is often a file, but can be a symlink or pretty much anything else. The object that points to the blob
determines the semantics.
tree: Directories are represented by tree object. They refer to blobs that have the contents of files (filename, access mode, etc is all stored in the tree), and
to other trees for subdirectories.
git-annex for computer
objects
pointers
remotes
metadata
Scientists
4
vif
O
4
ø
1. annex objects
Files
Often large
Not stored in git repository
Named by hash of content
SHA1-s105906176- -65b01484ae2cf78b96060b51b653b7801c81e77b
2. pointers
* Checked into git repository
* Point to an annex object
* Often a symlink, can also be a pointer file
foo -> .git/annex/.../SHA1-s105906176-
65b01484ae2cf78b96060b51b653b7801c81e77b
2. pointers
* Checked into git repository
* Point to an annex object
* Often a symlink, can also be a pointer file
foo -> .git/annex/.../SHA1-s105906176-
65b01484ae2cf78b96060b51b653b7801c81e77b
e Can point to an object that Is not present
in the local repository
KT aaa LU rene
3. remotes
git remotes
special remotes
each repository has
a UUID
Each stores some
set of annex objects
origin
4. metadata
e stored in git, in the
git-annex branch
e tracks the locations of
each object
e and other information
e auto-merged without
conficts (CRDT)
example
> git-annex get
a |
origin bar
example
> git-annex get
get foo (from bar) ok
foo -> .git/annex/.../SHA1-s10596-65b0148
OPE bar
Ouestions?
objects
large files, named by hash of content
pointers
checked into git, symlinks
remotes
have UUIDs, store objects
metadata
git-annex branch
new* git-annex features
e Compute special remote
e Proxies
* Mask special remote
e Clusters
* since Distribits 2024
Compute special remote
“Stores” annex objects by remembering how to
compute them from inputs
Inputs can include other objects, as well as values
Each compute special remote uses a compute
program git-annex-compute-foo
Compute special remote
> git-annex initremote foo type=compute program=git-annex-compute-
imageconvert
origin
Compute special remote
> git-annex initremote foo type=compute program=git-annex-compute-
imageconvert
— Ce
origin imageconvert
Compute special remote
> git-annex addcomputed --to=imageconvert foo.jpeg
foo.gif
foo.jpeg -> .git/annex/.../SHA1-s10596-65b01484
origin imageconvert
Compute special remote
> git-annex addcomputed --to=imageconvert foo.jpeg
foo.gif
Aramniitad fan aif a
foo.jpeg -> .git/annex/.. “ISHA1- s10596-65b01484
foo.gif -> .git/annex/../SHA1-s9213--d3628129
origin imageconvert
Proxies
> git-annex get foo.jpeg --from=bar-imageconvert
bar-imageconvert
imageconvert
N
N
X
ĖS
N 4
Mask special remote
* Adds a layer of encryption to another
remote
+ Allows mixing encrypted and unencrypted
annex objects in a single remote
SHA1-s105906176--65b01484ae2cf78b96060b51b653b7801c81e77b
GPGHMACSHA1--2c5d3a70f60ca854c26b0829542c1147a87a37a5
Mask special remote
> git annex initremote foo-encrypted type=mask remote=foo
encryption=hybrid keyid=...
foo
Mask special remote
> git annex initremote foo-encrypted type=mask remote=foo
encryption=hybrid keyid=...
Initremote foo ok
foo foo-encrypted
Clusters
> git-annex get foo --from=bar